(19) 



(12) 



(43) Date of publication: 

27.03.1996 Bulletin 1996/13 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets (11) EP 0 703 71 2 A2 

EUROPEAN PATENT APPLICATION 

(51) IntCI. 6 : H04N7/50 



(21) Application number: 95114785.9 

(22) Date of filing: 20.09.1 995 



(84) 


Designated Contracting States: 


(72) 


Inventor: Galbi, David E. 




DE FR GB 




Sunnyvale, CA 94089 (US) 


(30) 


Priority: 23.09.1994 US 311659 


(74) 


Representative: Reinhard - Skuhra - Weise & 








Partner 


(71) 


Applicant: C-CUBE MICROSYSTEMS, INC. 




Friedrichstrasse 31 




Milpitas, California 95035 (US) 




D-80801 MOnchen (DE) 



(54) MPEG audio/video decoder 

(57) An MPEG audio/video decoder has memories, 
a signal processing unit (SPU) including a multiplier and 
a butterfly unit, a main CPU, and a memory controller 
which are time division multiplexed between decoding 
video and audio data. For audio decoding, the butterfly 
unit determines combinations of components of a fre- 
quency-domain vector to reduce the number of multiplies 
required to transform to the time domain (matrixing). 
Matrixing is interwoven with MPEG filtering to increases 
throughput of the decoder by increasing parallel use of 
the multiplier, the butterfly unit, and a memory controller 
The decoder includes a degrouping circuit which per- 
forms two divisions in three clock cycles to degroup a 
subband code. Three cycles matches the write time of 



three components so that subband codes are degrouped 
and written to memory with a minimum delay Performing 
two divides in three clock cycles allows the divider to be 
smaller. In response to an error signal from a source of 
an MPEG audio data stream, the decoder replaces data 
with an error code and temporarily enables error han- 
dling. The error code is a valid bit combination rarely 
found in MPEG audio data frames. During audio decod- 
ing with error handling enabled, the decoder checks for 
the error code and replaces the error code with recon- 
structed data. Typically, some subband data are replaced 
with zeros so that an error only changes some of the fre- 
quency components. 
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Description 

CROSSREFFERENCETO RELATED APPLICATION 

5 This is a continuation-in-part of U.S. Pat. App. Serial No. 08/288,652 entitled "A Variable Length Code Decoder for 

Video Decompression Operations," filed August 10. 1 994, which is a continuation of U.S. Pat. App. Serial No. 07/890,732, 
filed May 28, 1992 (now abandoned) which was a continuation-in-part of U.S. Pat. App. Serial no. 07/669,818, entitled 
"Decompression Processor for Video Applications," filed March 1 5, 1 991 (now abandoned), all of which are incorporated 
by reference in their entirety. 

10 

COPYRIGHT NOTIC E 

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The 
copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, 
15 as it appears in the Patent and Trademark Office patent f Hes or records, but otherwise reserves all copyright rights 
whatsoever. 

BACKGROUND OF THE INVENTION 
20 Field of the Invention 

This invention relates to decoders for generating audio signals from digital data, and in particular to combined audio 
and video decoding according to the MPEG standard. 

25 Description of Related Art 

The Motion Picture Experts Group (MPEG) developed an international standard (sometimes referred to herein as 
the "MPEG standard") for representation, compression, and decompression of motion pictures and associated audio 
on digital media. The International Standards Organization (ISO) publication. No. ISO/IEC 11172: 1993 (E), entitled 

30 "Coding for Moving Pictures and Associated Audio - for digital storage media at up to about 1 .5Mbit/s," describes the 
MPEG standard and is incorporated by reference herein in its entirety. The MPEG standard specifies coded digital 
representations of audio and video and is intended for continuous data transfer from equipment such as compact disks, 
digital audio tapes, or magnetic hard disks, at rates up to 1 .5 Mbits per second. 

Under the MPEG standard, parallel data streams or time multiplexed data streams provide video data frames and 

35 audio data frames. Methods and systems for decompressing video data frames are described in U.S. patent applications 
serial Nos. 07/890.732 and 07/669,81 8 which were incorporated by reference above. Audio data frames contain a header, 
side information, and subband data. Subband data indicate frequency-domain vectors that are converted to time-domain 
output sound amplitudes by a transformation (matrixing) and a smoothing filter (windowing). 

Typically, MPEG audio/video decoding systems for decoding digital include, two decoders, one for audio decoding 

40 and one for video decoding, on separate two integrated circuit chips. The audio decoder and video decoder are separated 
because of the differences between MPEG audio coding techniques and MPEG video coding techniques, but separate 
audio and video decoder increase the amount of circuitry in and the cost of an audio/video decoding system. A decoding 
architecture is needed that reduces the amount of circuitry and the cost of decoding MPEG audio and video data. 

45 SUMMARY OF THE INVENTION 

In accordance with this invention, an MPEG audio/video decoder integrated on a single chip uses components such 
as memories, a main CPU. a memory controller, and a signal processing unit (SPU) for both audio and video decoding. 
The SPU contains a multiplier (or muttiply-and-accumulate unit) and a butterfly unit which together alternately decode 

so video data and then decodes audio data. The combination of a multiplier and a butterfly unit is efficient for both audio 
and video decoding. In particular, for audio decoding, determining particular sums and difference of the components of 
a frequency-domain vector with a butterfly unit reduces the number of multiplies required for matrixing (i.e. determining 
a component of a time-domain vector from a frequency-domain sample vector). Determining combinations of the com- 
ponents can be performed in series with dequantizing and descaling of the components combined. Additionally, matrixing 

55 and windowing (i.e. combining a present time-domain vectors with previous time-domain vectors) are combined in a 
single instruction to increase throughput of a decoder by increasing parallel use of the multiplier, the butterfly unit, and 
a memory controller which reads and writes to an external memory. 

Also in accordance with this invention, a degrouping circuit for decoding MPEG standard subband codes includes 
a divider which uses three dock cycles to perform two divisions which convert a MPEG subband code into three vector 
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components. Performing two divides in three clock cycles instead of two allows the divider to be smaller and less costly, 
but does not slow decoding because three clock cycles is the time required to write three vector components into a 
single-port memory. Accordingly, the smaller divider does not significantly increase the time required to degroup subband 
codes and write the resulting components into memory. Using the known limits on input dividends of the divider, the size 
and cost of the divider can be further reduced from that of a general purpose divider. 

Also in accordance with this invention, in response to an error signal from an external source of an MPEG audio 
data stream, an MPEG audio decoder replaces errors into the audio data stream with an error code which is a bit 
combination rarely found in MPEG audio data frames, and then temporarily enables error handling. The audio data 
stream containing error codes can be saved or bufferred in the decoder. During audio decoding with error handling 
enabled the decoder checks the audio data for the bit combination equaling the error code and replaces the bit combi- 
nation with reconstructed data. The replacement attempts minimizes the audible effects of an error. Typically, some 
subband data is replaced with zeros so that an error causes some of the frequency components to be lost. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows a block diagram of an MPEG audio/video decoder in accordance with an embodiment of this invention. 
Rg 2 shows a block diagram of a degrouping circuit in accordance with an embodiment of this invention. 
Figs. 3A, 3B, and 3C show a block diagram, a logic table, and gate level diagram of a divide-by-three circuit in 
accordance with this invention. ... . 

20 Figs. 4A, 4B. and 4C show a block diagram, a logic table, and gate level diagram of a divide-by-f ive circuit in accord- 
ance with this invention. 

Figs. 5A and 5B show a block diagram of another embodiment of degrouping circuit and a gate level diagram ot an 
address generator for dividing by three, five, or nine in accordance with this invention. 

Fig. 6 shows memory maps of previous vector components used during a windowing process in accordance with 
25 this invention. . . ... 

Figs. 7A, 7B, and 7C show a block diagram of an embodiment of a signal processing unit in accordance with an 

embodiment of this invention. 

Rg. 8A shows a flow diagram of an audio decoding process in accordance with this invention. 
Rg. 8B shows a timing diagram for the process of Fig. 8A. 
ao Use of the same reference symbols in different figures indicated similar or identical elements. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In accordance with this invention, an audio/video decoder decodes MPEG standard data streams to provide an 
35 audio signal and a video signal. The audio/video decoder alternates between decoding video data frames and decoding 
audio data frames and employs the same memories and signal processing unit (SPU) for both audio and video decoding. 

Rg 1 shows block diagram of an audio/video decoder 100 for decoding MPEG standard audio and video data 
frames. Decoder 100 receives MPEG standard coded audio and video data via a serial bus 1 04, decodes the audio and 
video data, and provides the decoded data over a video bus 176 and an audio bus 192. Decoder 100 includes state 
random access memories (SRAMs) 134 to 137 (also referred to herein as ZMEM 134, QMEM 135. TMEM 136 and 
PMEM 137) which alternate between holding video data for video decoding and holding audio data for audio decoding, 
and a signal processing unit (SPU) 140 which includes an instruction memory, a register file, a multiplier or a multiply- 
and-accumulate unit (MAC), and a butterfly unit for decoding and decompressing video data or audio data depending 
on whether decoder 100 is currently decoding video or audio. 

Audio/video decoder 100 interfaces with source of audio and video signals such as a host computer and a compact 
disk digital signal processor (CD-DSP) over a host bus 102 and serial bus 104. Serial bus 104 carries a stream of 
compressed audio and video data following the MPEG standard, which decoder 100 receives through a first-in-f.rst-out 
(FIFO) buffer 115 ("code FIFO 1 15"). A memory controller 180 reads the compressed data from code FIFO 1 15 via a 
main bus 155 and writes the compressed data to an external memory 160 (also referred herein as DRAM 160). As 
so disclosed below, an audio error code injector 1 18 can inject error codes into audio data written to DRAM 160. A central 
processing unit (CPU) 1 50, which is a microcoded processor having its own instruction memory controls access to main 
bus 155 and in particular, sends commands to memory controller 130 which cause the data transfer from code FIFO 

115 In this^bodiment, DRAM 160 contains dynamic random access memory (DRAM) components. Other suitable 
55 memory technologies can also be used. DRAM 160 holds compressed data from serial bus 104 and decompressed 
data for output to an audio bus 1 92 or a video bus 1 76. Under the direction of CPU 1 50. memory controller 180 transfers 
compressed audio or video data to a decoder FIFO 125 for decoding of an audio data frame or a video data frame by 
SPU 140. 
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According to the MPEG standard, a video data frame is a compressed digital description of a picture and an audio 
data frame is a digital description of a fixed number of frequency-domain sound samples in up to two sound channels. 
The MPEG standard for video data frames and decoding of video data frames to produce a video signal are described 
in U.S. patent applications serial Nos. 07/890,732 and 07/669,818 which were incorporated by reference above. The 

5 MPEG standard currently defines three types of audio data frames referred to as layer 1 , layer 2, and layer 3 data frames. 
Decoder 100 in Fig. 1 decodes layer 1 and layer 2 audio data frames. Layer 1 and layer 2 audio data frames contain a 
header, side information, and subband data. The header indicates: the bitrate of the data stream providing the audio 
data frames; the sample frequency of the decoded sound ; whether the subband data contains on e or two sound channels; 
and a mode extension describing whether the sound channels in the subband data are independent, stereo, or intensity 

10 stereo. The side information indicates the number of bits allocated per subband in the subband data and an index to 
scalefactors F for dequantizing and descaling subband data as describe below. 

CPU 150 controls the percentage of time SPU 140 spends decoding audio data. For audio decoding, CPU 150 
directs memory controller 180 to move audio data from DRAM 160 to decoder FIFO 125 and directs SPU 140 perform 
the calculations necessary for decoding audio data. SPU 1 40 operates in parallel with CPU 1 50 and executes commands 

15 according to software stored in an instruction memory in SPU 1 40. 

When decoding an audio data frame, SPU 1 40 first executes a "get bits" command which loads the header and side 
information of the audio data frame, from decoder FIFO buffer 125, through a VLC/FLC decoder 120, into CPU 150. 
The CPU 1 50 writes bit allocations and scalefactors from the side information through SPU 1 40, into QMEM 1 35. Header 
and side information pass through VLC/FLC decoder 120 unchanged. Subband data follows the side information in the 

20 data stream from decoder FIFO buffer 1 25. VLC/FLC decoder 120 contains circuits for decoding variable length codes 
(VLC) in video data and fixed length codes (FLC) in audio and video data. VLC/FLC decoder 1 20 also contains degroup- 
ing circuits for audio data as described below. 

A "get subbands" command executed by SPU 140 causes VLC/FLC decoder 120 to parse and convert subband 
codes Ci from decoder FIFO buffer 1 25 into 1 92 scaled and quantized components Si". VLC/FLC decoder 1 20 preforms 

25 degrouping as required and writes the scaled and quantized components Si" into ZMEM 134. Each frequency-domain 
vector S" has 32 components Si" in 32 frequency ranges (subbands i). The "get subbands" command writes components 
Si" for three frequency-domain vector S" in each channel (six vectors S" total for two channels) in ZMEM 134. For 
intensity stereo, some of the frequency components Si" are used by both channels. VLC/FLC decoder 120 write two 
copies of components that are shared by the channels so that each vector S" in ZMEM 134 has 32 components Si M . For 

30 monophonic sound, VLC/FLC decoder 1 20 can write two copies of all components Si" so that both channels of a stereo 
output signal are the same. The number of vectors S" in an audio data frame depends on the number of channels and 
whether the audio data frame follows layer 1 or layer 2 of the MPEG standard. Under layer 1, there are 12 vectors S" 
(384 samples) per channel. Under layer 2, there are 36 vectors S" (1 1 52 samples) per channel. 

SPU 140 executes a "dequant/descale" command to generate components Si of frequency-domain vectors S by 

35 descaling and dequantizing values Si" from ZMEM 134. SPU 140 writes a representation of a vector S to a portion of 
TMEM 1 36. Matrixing as described below transforms a frequency-domain vector S to a time-domain vector V. SPU 140 
stores components Vi of a time-domain vector V in PMEM 137. and memory controller 180 writes components Vi from 
PMEM 137 to DRAM 160. Components from 1 6 consecutive time-domain vectors V<> to V~16 from DRAM 160 are com- 
bined in a windowing process described below, and the combination is accumulated in TMEM 136 to provide 32 time- 

40 domain output sound amplitudes Ai. Time-domain output sound amplitudes Ai are typically written to an audio output 
FIFO buffer in DRAM 160, and written as required from DRAM 160 through main bus 155, an output audio FIFO 190, 
and an audio serializer 1 91 to audio output bus 1 92. Output audio FIFO buffer 1 90 holds enough output sound amplitude 
values so that at the fastest sampling rate expected delayed access to main bus 155 does not interrupt sound. Audio 
serializer 1 91 converts the output audio data to a serial data stream, and a digital-to-analog converter (DAC) and amplifier 

45 (not shown) generate a sound from the audio data. 

The side information indicates the number of possible values for each quantized component Si" (and each subband 

code Ci) in a subband i. For example, if subband codes Ci in subband i have 0, 2, 4 or 2 N possible values, then 0, 

1 , 2 or N bits are used for each code Ci. If no bits are used for a subband i, VLC/FLC decoder 120 writes zero into 

ZMEM 134 for components Si", and vector S has less than 32 non-zero components. For a bit allocation representing 

so 2 N possible values for a subband i, VLC/FLC decoder 120 uses the bit allocations from the side information in QMEM 
135 to identify the start and end of a component Si" in the data stream and writes component Si" to a word aligned 
location in ZMEM 134. 

The MPEG standard allows components Si" to have 3, 5, or 9 possible values and encodes three components S1i", 
S2i", and S3i" from subband i of three different vectors S1, S2, and S3 into a single code Ci. For example, there are 27 
55 possible combinations of three quantized and scaled components S1i", S2i", and S3i" if each has three possible values 
0, 1 . or 2. A 5-bit subband code Ct given by eq. 1 represents the 27 possible combinations. 

Ci = 3 2 -S3i ,, + 3-S2i" + S1i" (eq. 1) 
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Similarly, a 7-bits code Ci given by eq. 2 represents three components S1i'\ S2i", and S3i" having five possible values 
0 to 4 each. 

Ci = 5 2 -S3i" + 5*S2i M + S1i" (eq. 2) 

5 

Eq. 3 gives a 1 0-bits code Ci representing three components S1i" r S2i", and S3i M which each have 9 possible values 0 to 8. 

Ci - 9 2 -S3i" + 9-S2i" + S1i" (eq. 3) 

10 VLC/FLC decoder 1 20 degroups a code Ci into three components S3i". S2i", and S1 i" given by eqs. 1 to 3 before 

writing the scaled and quantized components S3i",'S2i" and S1i M to ZMEM 134. Two divisions are sufficient to degroup 
a code Ci given by eqs. 1 to 3. For example, if Ci = x 2 *S3i" + x-S2i" + S1i" and components S3i", S2i" and Si i" are 
less than x, dividing Ci by x provides a quotient Q1 and a remainder R1 given by eq. 4. 

15 (Ci/x) = Q1 = x-S3i" + S2i" with remainder 

R1=S1i" (eq.4) 

Dividing by x again provides a quotient Q2 and a remainder R2 given by eq. 5. 

20 

(Q1/x) ■ Q2 = S3i" with remainder 

R2 = S2i" (eq. 5) 

25 If ZMEM 134 has a single port connected to VLC/FLC 120, three clock cycles are required to write components S3i", 
S2i", and Sir. Accordingly, VLC/FLC decoder 120 can use three clock cycles for two divides which degroup a code Ci, 
and not cause a pipeline delay in writing components S3i", S2\", and S1i". 

Fig. 2 shows decoding circuit 200 which performs two divides for degrouping a code Ci in three clock cycles. The 
first divide is an extended divide that takes two clock cycles. The second divide takes one clock cycle. Using two clock 

30 cycles for the f irst divide permits use of a smaller divider and reduces cost of VLC/FLC decoder 1 20. In the embodiment 
of Fig. 2, a divider 210 receives dividend values from multiplexers 220 and 221 and divides the dividend values by a 
divisor x equal to 3, 5, or 9 to produce a quotient Q and a remainder Rout. Side information gives the bit allocation for 
each subband and determines the value of divisor x for each subband which requires degrouping. 

Code Ci is partitioned into three parts CiH, CiM, and CiL for the first divide of degrouping. CiL contains the 2, 3. or 

35 4 least significant bits of code Ci for divisor x equal to 3, 5, or 9 respectively. CiM contains the next 2, 3, or 4 more 
significant bits of code Ci, and CiH contains the most significant 1, 1, or 2 bits of Ci for divisor x equal to 3. 5, or 9 
respectively. CiH is padded on the left with zeros to 2, 3, or 4 bits. 

Degrouping proceeds as follows. During a first clock cycle, multiplexers 220 and 221 assert signals CiH and CiM to 
divider 210, and divider 210 produces a quotient Q1H and a remainder RV which are written to registers 231 and 230 

40 at the end of the first clock cycle. Registers 230 and 231 in the embodiment of Fig. 2 are edge triggered device, but in 
alternative embodiments, registers 230 and 231 may be latches, memory locations, or any devices capable of holding 
and asserting digital data signals. During a second clock cycle, multiplexers 220 and 221 assert respectively remainder 
R1 ' from register 230 and signal CiL to divider 210, and divider 210 produces a quotient Q1 L and remainder R1. At the 
end of the second clock cycle, quotient Q1L and remainder R1 are written to registers 231 and 230 respectively, and 

45 quotient Q1 H is written from register 231 to a register 232. Quotients Q1 H and Q1 L are respectively the most significant 
and least significant bits of the quotient Q1 given in eq. 4. Remainder R1 is value S1T as in eq. 1 . 2, or 3. 

During a third clock cycle, multiplexers 220 and 221 assert respectively signals Q1H and QILfrom registers 230 
and 231 to divider 210. divider 210 produces quotient Q2 and remainder R2 that are given in eq. 5. and a multiplexer 
240 selects value R1 from register 230 for writing to a memory such as ZMEM 1 34 of Fig. 1 . At the end of the third dock 

so cycle, quotient Q2 and remainder and R2 are written to registers 231 and 230, and the quotient Q1L is written from 
register 231 to register 232. 

During a fourth clock cycle, remainder R2 which equals S2i M passes through multiplexer 240 and is written to the 
memory. Quotient Q2 is written to register 232 at the end of the fourth clock cycle. Quotient Q2 which equals S3i" is 
written to memory during the fifth clock cycle. A first divide for a second code Ci* can be performed during the fourth 
55 and fifth clock cycles and can proceed as disclosed above. Accordingly, if a series of codes C are degrouped, degrouping 
proceeds with a pipeline delay only for the first code in the series. 

Any known or yet to be developed digital divider circuit may be employed for divider 2 1 0 providing the divider circuit 
handles the correct size dividend, quotient, and remainder. Fig. 3A shows a block diagram of a divide-by-three circuit 
300 which uses the limits on the values of codes C to reduce number of gates and transistors required. Divide-by-three 
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Se-by-thTee circuit 300 in response to the corresponding bit allocat.cn of a subband. 
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cv>» and S3" and subband codes for two channels 
id es va.ues Sir. 82T. and 88T *jjn- ^S- 81 ^^ in the worst case, components from « 
3d in the data stream under the MPEU one complete vector S" is known. 
2s (three in each channel) must be «f 0 *J^£^ iao t0 get subband data. VLC/FLC decoder 
sediment of Fig. 1. after SPU WnM ^Cd ecoder^ ^9 q ^ FiPObuJ^g. 
,e bit allocation per subband fro ^!^ e 1 ^^ quantized components Si"tors«vertomto ZMEM 
bband codes C (if necessary), and w n» JJ ^ erations . m alternative embodiments, SPU 

out intervention from SPU 140. This fre* SPU 14 °^ J^des. For example, SPU 140 car. read *e M 
e greater control of reading and * subb and in response to separate commands 

feach subband. and VLC/FLC decoder '"S^TSSSom. degrouping of values from decoder 
InVc/FLC decoder 120 can also be eliminated if SPU vw jk» . comman d because ZMEM 

VStffi**** ^.S^SSS SX^L "scribed above may perform 
ot have space for more than s «^°^ throughput of decoder 100. 



(eq-6) 
Si' = K1 • (Si" + K2) 

(eq.7) 

Si = F • Si' 



... „oi,.o«: <5i" and the side information of the 
tanK K1 ^d K2 depend on the number of bits " s ^ c to fP^ n e "* V ^o a value Si" is sometimes referred to 

are (eq. 8) 

Nji = cos[(16+j)(2i+1)^64l 

\ to Z or AA to AE. Eq. 8 indicates the values A to ^ anu 
;to r V has 64 components Vj given by eq. 9. 



31 (eq. 9) 

Vj = £ Nji • Si 

wo 



^nnents VI of vector V are linearly independent. . V48 reauires 32 multiplications of components 

^ %^Sl Tach require 16 multiplications. Components with an man : eq or ^ nega tve of 
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Interpoiater 173 expands the decoded video data by two horizontally and by two vertically. Video overlay data such a 
data representing lyric text is read from DRAM 160 into overlay FIFO 172, and block 174 blends the overlay data with 
data from interpoiater 1 73 to provide pixel values. Converter 1 75 optionally converts the pixel values from a YCbCr color 
representation to a RGB color representation which is transmitted on video output bus 176. 

5 Figs. 7A, 7B, and 7C show a block diagram of a signal processing unit 1 40 in accordance with this invention. SPU 

1 40 has an instruction memory (not shown) and a control unit (not shown) which executes a decoding program stored 
in the instruction memory. SPU 140 decodes audio and video data frames using information stored in memories 134 to 
137. Fig. 7A shows portions of SPU 140 for audio decoding. ZMEM 134 is a (3x64)x16 bit SRAM and is large enough 
to store six vectors S M each containing thirtytwo 1 6-bit components Si", during audio decoding. During video decoding, 

10 ZMEM 1 34 is a "zig-zag" memory which stores two or three sets of 64 9-bit video coefficients. QMEM 1 35 is a 64x(2x8) 
memory. During audio decoding, QMEM 135 holds 32 subband bit allocations and scalefactor indices for each of two 
sound channels. During video decoding, QMEM 135 holds two sets of 64 8-bit components of video quantizer matrices 
according to the MPEG standard. Quantizer matrices are swapped between QMEM 135 and DRAM 160 as required 
when switching between video and audio decoding. 

15 For audio decoding, a VLC/FLC decoder 120 writes six quantized and scaled vectors S" to ZMEM 1 34 as described 
above. SPU 140 performs a "dequant/descale" instruction and "window/matrix" instructions on each vector S n in ZMEM 

134. The dequant/descale instruction determines combinations TO to T31 from a vector S" by dequantizing, descaling, 
and butterfly operations. For descaling, a 16-bit component Si" from ZMEM 134 is fed through a multiplexer 716 as an 
input value Z of MAC 750. A multiplexer 706 asserts a value X equal to -1 to a register 707 coupled to MAC 750, and 

20 multiplexer 712 asserts a value Y which equals K2 as given in eq. 6, from ROM 732 to a register 71 3 coupled to MAC 
750. MAC 750 determines the product of value X and value Y and then subtracts value Z. A register 71 7 captures the 
output value from MAC 750 which can be written to a multiported register file 733 which has three read ports and three 
write ports. The value Si"+K2 is stored to register file 733. In a second pass through MAC 750, multiplexer 706 asserts 
signal X equal to Si"+K2 from register file 733. through register 707. to MAC 750. Multiplexer 712 asserts signal Y equal 

25 to K1 (eq. 6) from ROM 732, through register 713, to MAC 750. Multiplexer 716 asserts a value Z equal to zero. The 
output signal of MAC 750 is a dequantized value Si' which is again written to register file 733. 

ROM 732 contains two ROMs 732A and 732B which are alternately accessed to provide ROM 732 with twice the 
read speed of ROMs 732A and 732B. ROM 732 contains constants for dequantizing, descaling, matrixing, windowing, 
and video decoding. The control unit of SPU 140 determines the correct address in ROM 732 from the side information 

30 in QMEM 135. 

Dequantized value Si* is asserted through multiplexer 706 as a value X for descaling. Descaling is performed in two 
multiplications. For the first multiplication, multiplexer 712 and register 713 provide a first scalefactor F1 which is one of 
1 , 2' 1;3 , and 2" 2/3 from ROM 732 according to an index from the side information in QMEM 1 35. Value Z from multiplexer 
71 6 is zero. The resulting partly descaled value is held by register 71 7, stored to register file 733, and asserted through 
35 multiplexer 706 and register 707 as value X for the second multiply. Multiplexer 71 2 and register 71 3 provide a second 
scalefactor F2 which is one of 2"1 to 2"20 from ROM 732 according to the index from the side information in QMEM 

135, and again value Z is zero. The product of F1 and F2 equals scalefactor F of eq. 8. Descaling with two multiplications 
reduces round-off error which might result from a single multiplication by scalefactor F which is small. 

The dequantized and descaled value Si is written to register file 733, and SPU 140 dequantizes and descales a 

40 second component Sk" from ZMEM 1 34 in the same manner as described above. When component Sk" is dequantized 
and descaled to provide component Sk, butterfly unit 760 calculates the sum and the different of Si and Sk. Calculation 
of sums and differences is conducted in parallel with dequantizing and descaling other components. Components Si" 
are descaled and dequantized in order that facilitates calculation of sums and differences TO to T31 shown in Appendix B. 
One example dequantizes and descales components SO, S31, S15, S16, S7, S24. S8. and S23 in that order for 

45 determination of sum T28. Butterfly unit 760 determines the sum and difference of SO and S31 while MAC 750 determines 
components S1 5 and S16. A register 725 holds the sum S0+S31 for writing into register file 733. A register 726 holds 
difference (S0-S31) = TO , which passes through a register 727. a multiplexer 728, an audio clamp 724, and a multiplexer 
723 to be written in TMEM 136. Subsequently, butterfly unit 760 determines the difference (S15-S16)=T15 which is 
similarly stored in TMEM 136 and the sum (S15+S16) which is temporarily stored in register file 733. Next, butterfly unit 

so 760 determines the sum and difference of the sums (S0+S31) and (S15+S16). The difference 
(S0+S31)-(S15+S16)=T15 is saved to TMEM 136. The sum (S0+S31)+(S154-S16) is temporarily stored in register file 
733. The same calculations as performed on S1 , S3 1 , S 1 5, and S1 6 are performed on S7, S24, S8, and S23 to determine 
(S7-S24)=T7 , (S8-S23)=T8 , (S7+S24)-(S8+S23)=T23 , and (S7+S24)+(S8+S23). Butterfly unit then combines values 
(S0+S31)+(S15+S16) and (S7+S24)+(SS+S23) from register file 733 to determine difference T24 

55 t(S0+S31)+(S15+S16)]-[(S7+S24)+(S&fS23)] and sum T28 [(S0+S31)+(S15+S16)]-[(S7+S24)+(S8+S23)], both of 
which are stored in TMEM 136. The remaining components of vector S are dequantized in parallel with operation of 
butterfly unit 760 in the order as required to determine sums T29 to T31 of Appendix B. 

After all combinations TO to T31 are determined and stored in TMEM 136. SPU executes a window/matrix instruction. 
Combinations TO to T31 are asserted to MAC 750 through multiplexer 706 and register 707. MAC 750 multiplies com- 
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binations TO to T31 by matrixing coefficients from ROM 732 as given in Appendix C to determine components V17 to 
V48. Butterfly unit 760 performs additions or subtractions needed to accumulate components V17 to V48 which are 
pass through registers 726 and 727, multiplexer 728, and clamp 729 to be saved in PMEM 137. The number of multi- 
plications require is 4, 8, or 16 per component Vi depending on the index. 

s Windowing filters vector components Vi which result from matrixing. For each window/matrix instruction, memory 

controller 180 reads sets of 33 previous vector components as in memory map 630 of DRAM 160, into PMEM 137 with 
the 33 previous vector components from oldest vectors (from vectors V~ 1 5 and V"™) are read from DRAM 1 60 first. The 
previous vector components in PMEM 137 are fed through multiplexer 706 and register 707 to MAC 750. MAC 750 
multiplies the previous vector components by windowing coefficients D(i+32k) and accumulates the product into 32 

10 sound amplitude values being accumulated in TMEM 136. For each set of 33 vector components, 64 multiplies are 
performed, and two values are accumulated to each sound amplitude value. 

Matrixing and windowing of a vector requires eight window/matrix instructions. Each window/matrix instruction deter- 
mines and stores into DRAM 160 four components of a vector V<>, and accumulates two windowing contributions for 
each of 32 sound amplitude values Ai. Before the first window/matrix instruction, old sound amplitude values Ai must 

is be saved from TMEM 136 to DRAM 160. Saving old sound amplitude values can be performed simultaneously with 
dequantizing and descaling of a new vector if TMEM 1 36 is dual ported or if writes to TMEM 1 36 during the dequantizing 
and descaling process can be stalled. Otherwise window/matrix instruction must wait until old sound amplitude values 
are saved to DRAM 1 60. Window/matrix instruction also must wait until dequantizing and descaling of the current vector 
is complete. 

20 Initially, memory controller 180 transfers 33 vector components from DRAM 1 60 to a f irst portion of PMEM 137. For 
the MPEG standard, vector components are kept to 20 bits of accuracy, but standard DRAM have 1 6-bit storage locations. 
Accordingly, 33 vector components are stored at 42 addresses in DRAM 1 60. PMEM 1 37 is 1 8 bits wide for holding two 
9-bit video error terms. Vector components are stored in PMEP 137 in 42 addresses as received from DRAM 137. Once 
the 33 vector components are in PMEM 137, SPU begins executing a window/matrix instruction on the first portion of 

25 PMEM 137 and retrieves 20-bit components as required. The window/matrix instruction accumulates the windowing 
contributions of the 33 components in PMEM 1 37 to the sound amplitude values Ai in TMEM 1 36 and determines a set 
of four vector components of the current vector V<>. It should be noted that each set of four vector components V17 to 
V20, V21 to V24, V25 to V28, V29 to V32, V33 to V36, V37 to V40, V41 to V44, and V45 to V48 if determined by the 
equations in Appendix C requires 44 multiplications. The set of four vector components determined by matrixing are 

30 stored in PMEM 137. 

Simultaneously with execution of the window/matrix instruction, memory controller 180 transfers 33 more vector 
components from DRAM 1 60 to a second portion of PMEM. When a window/matrix instruction is complete, four vector 
components are written from PMEM 137 to DRAM 160, and then another window/matrixing instruction begins using the 
second portion of PMEM 137. The eighth and final window/matrix instruction for a vector V° uses components of vector 

35 V° for windowing. Since windowing only requires components V°1 7 to V°33 for windowing, the necessary components 
for windowing are calculated and stored in D RAM 1 60 in previous window/matrix instructions before being retrieved for 
windowing. After the eighth window/matrix instructions, the 32 sound amplitude values Ai are ready for transfer from 
TMEM 136 to DRAM 160. Audio clamp 724 clamps the accumulated sound amplitude values. Ai to 16 bits for writing to 
DRAM 160. H dequantizing and descaling is not stallable, SPU 140 waits while memory controller 180 transfers sound 

40 amplitude values Ai to DRAM 1 60. 

An advantage of the combined window/matrix step arises because multiply time limits matrixing and windowing is 
slightly limited by memory access to DRAM 160. Combining windowing and matrixing provides an instruction that more 
evenly utilizes the resources of SPU 1 40 and decoder 1 00. Additionally, if the windowing and matrixing were not combined 
transfers from DRAM 160 to video FIFOs 171 and 172 (and FIFOs 125, 115, and 190) would delay windowing. By 

45 combining windowing and matrixing, transfers from DRAM 160 to video FIFOs 171 and 172 can overlap the win- 
dow/matrix computations because matrixing does not use much DRAM bandwidth. Matrixing only needs to write four 
20-bit values to DRAM 160. 

Fig. 8A illustrates a process loop executed by CPU 150 for audio data frame decoding 137, and Fig. 8B shows the 
timing of the process loop. Initially, in step 805, CPU 150 loads QMEM 135 with scalefactor indices and bit allocations 

so for a layer 1 audio data frame or for part of a layer 2 audio data frame and then in step 810, requests that memory 
controller 1 80 transfer 33 vector components from DRAM 1 60 to PMEM 137. The 33 vector components are transferred 
to afirst half of PMEM 137 during time T1 (Fig. 8B). Meanwhile, CPU 150 issues a get subbands command in step 820 
that VLC/FLC decoder 1 20 executes in parallel with the transfer during time T1 . The get subbands command as disclosed 
above moves components for six vectors into ZMEM 134. CPU 1 50 waits in step 825 until VLC/FLC decoder 120 is idle 

55 before issuing a dequant/descale command in step 835. SPU 140 performs the dequant/descale command in parallel 
with the transfer during time T1 . 

SPU 140 can not proceed from the dequant/descale command to a window/matrix command until transfer of 33 
vector components requested in step 810 is complete because the 33 vector components are required for windowing. 
SPU requires the results of the dequant/descale command (step 835) for matrixing. Additionally, window/matrix com- 
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mands can not begin until previously determined sound amplitude values are saved from TMEM 136 to DRAM 160. 
Accordingly, CPU 1 50 waits in steps 840, 845, and 855 before directing SPU 140 to execute a window/matrix command 
in step 860. During a time T2, the window/matrix command is performed as disclosed above. In step 865, CPU 150 
requests that memory controller 180 transfer 33 more vector components from DRAM 160 to a second half of PMEM 

5 137. The transfer of 33 more components occurs during time T3 in parallel with the window/matrix command of step 
860. Memory controller 180 does not interfere with data being used by SPU 140 because memory controller 180 and 
SPU 140 access different halves of PMEM 137. 

CPU 150 waits in step 870 until SPU 140 completes the window/matrix command, before requesting in step 875 
that memory controller 180 transfer to DRAM 160 four vector components just determined by the window/matrix com- 

io mand. The transfer to DRAM 1 60 occurs during a time T4. CPU 150 transitions through step 880 back to step 855 and 
waits until the 33 component requested in step 865 are transferred to the second half of PMEM 137. Decoding proceeds 
as disclosed above except that a second window/matrix command of step 860 operates on the second half of PMEM 
1 37, and a second execution of step 865 request a transfer of 33 vector components to the first half of PMEM 1 37. SPU 
140 executes the second window/matrix command during a time T5 and in parallel with transfer of the previously four 

15 vector components to DRAM 1 60 during time T4 and then in parallel with transfer of the next set of 33 vector components 
from DRAM 160 during time T6 as shown in Fig. 8B. 

Steps 855 to 880 are repeated eight times. In each iteration, steps 860 and 865 alternate operating on the first and 
second halves of PMEM 137. During an eighth iteration of steps 855 to 880, vector components for windowing the next 
vector are requested in step 865 unless the vector is the last vector of the last set of vectors. After the eight iterations 

20 of the window/matrix command, CPU 150 transitions to step 885 and requests transfer of the 32 just determined sound 
amplitude values Ai from TMEM 136 to DRAM 160. CPU 1 50 transitions to step 890 and then step 835 and begins a 
dequarrt/descale command for the next vector in 2MEM 134. A loop from step 835 to step 890 is executed six times to 
decode three vectors in each of two channels. After the six vectors are decoded, CPU 1 50 jumps from step 895 to step 
820 to get subband data for a next set of six vectors. A loop from step 820 to step 895 is executed for four sets of vectors. 

25 After four sets of six vectors, new bit allocations and scalefactor indices are needed. 

Appendix D contains a C code listing of a program which executes the steps of dequantizing, descaling, matrixing, 
and windowing as described above. 

Audio/video decoder 100 (Fig. 1) of this invention also performs video decoding according to the MPEG standard. 
Video decoding under the MPEG standard is described in U.S. patent App. serial Nos. 07/890,732 and 07/669,81 8 which 

30 were incorporated by reference above. VLC/FLC decoder 125 converts codes in a video data stream from decoder FIFO 
1 25 into quantized discrete cosine transformation (DCT) coefficients which are stored in ZMEM 1 34. For video decoding, 
ZMEM 1 34 is sometimes referred to as a zig-zag memory because of the order in which coefficients are stored. QMEM 
135 holds dequantization constants which are swapped into QMEM 135 from DRAM 160 after audio decoding or are 
changed according to the video data stream. SPU 140 uses the dequantization constants for dequantizing the DCT 

35 coefficients. 

SPU 140 multiplies the dequantized DCT coefficients by a cosine factor and then converts the DCT coefficients to 
pixel values by a two-dimensional inverse discrete cosine transformation (I DCT). The two-dimensional I DCT may be 
performed as two one-dimensional IDCTs, and TMEM 136 is used to hold intermediated values during the IDCT. After 
the IDCT. the resulting error terms are stored into PMEM 137 and then written to DRAM 160. Decoded video is read 

40 from DRAM 1 60 through blocks 1 71 to 1 75 for output on video bus 1 76. 

SPU 1 40 executes operations including the dequantization, the cosine multiply, and the IDCT described above and 
in U.S. patent app. ser. No. 07/890,732. In addition to the blocks shown in Fig. 7A, SPU 140 uses the circuit blocks 
shown in Figs. 7B and 7C during a video decoding. During a cosine multiply operation, a multiplexer 712 is set to select 
a cosine factor from ROM 732 which MAC 750 multiplies by DCT coefficient. For a dequantization instruction, a dequan- 

45 tization constant is retrieved from QMEM 135 via a multiplexer 714 and a register 715. Multiplexer selects either the 
most or I east significant eight bits of an 16-bH signal from QMEM 135. A multiplier 71 1 scales the dequantization constant 
by a value provided by a multiplexer 710. Multiplexer 710 selects either a fixed constant for the DC term of intra mac- 
roblocks or a 5-bit scaling factor from registers 708 and 709. Multiplier 71 1 provides the scaled dequantization constant 
via multiplexer 712 and a register 713 to MAC 750 for multiplication by a DCT coefficient retrieved from ZMEM 134. 

so Prior to being assert to MAC 750, each 9-bit DCT coefficient from ZMEM 134 may be padded, decremented by 
decrementer 704, made odd or rounded towards zero by rounder 733, or clipped to a predetermined range by clamp 
705, according to the requirements of the MPEG standard. AND gate 702 sets a 9-bit DCT coefficient from ZMEM 134 
to zero in response to a control signal "coded". During a video dequantization instruction, multiplexer 703 selects output 
signal decrin[1 0:0] equal to an 1 1 -bit signal formed by padding the 9-bit zQCode[8:0] from gate 702, on the right. Alter- 

55 natively, when executing an instruction other than a dequantization instruction, multiplexer 703 selects signal decrin[10:0] 
equal to an 11-bit signal SRC3[13:3] from register file 733. Decrementer 704 decrements signal decrin[10:0] when 
required by the MPEG standard to provide an output signal decrouttl 0:0]. If a decrement operation is not required, signal 
decrout[10,0] equals signal decrin[10:0]. 
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Rounder 733 replaces bits 0 (the LSB) and 4 of the output datum of signal decrout[10:0] if required by to the MPEG 
standard. Rounder 733 zeros signal decrout[10:0] if the DCT coefficient from ZMEM 134 is zero, during execution of a 
dequantization instruction, or signal SRC3[13:3] is zero, during execution of a non-dequantization instruction (e.g. a 
cosine multiply instruction). Bits [21 :1 4] of signal SRC3 from the register file 733 prefixes to signal decrout[1 0:0J resulting 
in a 19-bit signal CLAMPIN[18:0] into clamp 705. Clamp 705 clamps signal CLAMPIN[18:0] to a 14-bit 14-bit signal 
CLAMPOUT[13:0] having values between -2047 and 2047 during execution of a non-dequantization instruction. Alter- 
natively, during a dequantization instruction, clamp 705 passes the input signal unchanged. Signal CLAMPOUT[13:0] 
is then zero-padded on the right to form a 22-bit signal through multiplexer 706 and register 707 as the signal X to MAC 
750. 

MAC 750 can, depending on the instruction executed, multiply two numbers X and Y (e.g. in a dequantization or 
cosine multiply instruction), or compute the value of the expression X*Y- Z (e.g. in an IDCT multiply-subtract instruction). 
The DCT coefficients are fetched from either ZMEM 1 34 or TMEM 1 36 to register file 733. In addition, the resulting value 
from MAC 750 can be routed as an operand to butterfly unit 760 bypassing register file 733. 

Butterfly unit 760 computes simultaneously the sum and the difference of two input operands X and Y Since MAC 
750 and butterfly unit 760 can each operate on their respective operands in parallel during the execution of a multiply 
instruction, a multiply instruction can result in both a multiplication result and a butterfly result. Additionally, a pipeline is 
achieved by using the output value (an "inter mediate" result) of MAC 750 directly through multiplexer 718 to butterfly 
unit 760. This arrangement increases throughput because the delay caused by loading then reading an intermediate 
result in register file 733 is eliminated. 

The results from a butterfly operation of a first pass IDCT are routed into TMEM 136, whereas the results from a 
butterfly operation of a second pass IDCT operation are "clipped" by clamp 729 and routed to PMEM 137. A program 
executable by SPU 140 for video decoding is described in U.S. Pat App. Serial No. 07/890.732. 

The MPEG standard does not define an error code that is injected into the audio bit stream because any possible 
bit combination can validly .appear in the bit stream of a layer 1 or layer 2 audio data frame. Instead a CD-DSP may 
generate a separate error signal for any audio data byte which includes a detected error. The audio/video decoder 100 
of Fig. 1 includes an audio error code injector 1 18 which when an error signal is received from a CD-DSP, changes a 
section of audio data received on serial bus 104 to a bit combination that is rare in audio data frames. Code FIFO 115 
is 18 bits wide to store two bytes of coded data each with a ninth bit for the error signal. Audio error code injector 118 
checks the error bit of the coded data, and if the error bit is set, overwrites the byte with an error code. For example, if 
decoder 100 receives an error signal while receiving an audio data stream, audio error code injector 118 inserts a 32- 
bit word aligned value 7FFD7FFD hexadecimal into an audio data frame. In this case, the error coded replaces the byte 
with the error and three other bytes. In a statistical study of MPEG data frames for actual sounds, the 32-bit value 
$7FFD7FFD was estimated to occur less than once every 100 hours of audio data. 

Bytes with errors can not be overwritten with an error code when written into code FIFO 1 15 because different types 
of data stream typically use different error codes, and decoder 100 does not identify the type of data stream containing 
the error until the data is removed from code FIFO 1 15. For example, audioMdeo decoder 100 may receive an audio 
data stream, a video data stream, and a lyric data stream. Errors in the video data stream are overwritten with $000001 B4. 
Errors in the lyric data stream are overwritten with 32 bits of zero. 

When an error signal for an audio data frame is received, host interface 110 inserts a 1 into an 8-bit shift register 
that is shifted once for every audio data packet. Accordingly, the value in the shift register is not zero for a number of 
audio data frames greater than or equal to the number of shifts required to move the 1 out of the shift register. The value 
in the shift register is non-zero for the time that an input audio data buffer in DRAM 160 could contain an error code. 
VLC/FLC decoder 120 checks for bit combinations equal to the error code in all audio frames that are decoded while 
the value in the shift register is not zero. H the bit combination is detected, the VLC/FLC decoder 120 initiates an error 
concealment procedure. Bit combination which are not actual injected error codes are rarely detected because the 
chances of the bit combination occurring within a short time interval of an audio frame containing an error are small. 

The error concealment procedure tries to minimize the effect that the error in the data stream has on sound quality. 
For example, if the error code occurs in subband data, VLC/FLC decoder 120 replaces the components corrupted by 
the error code with zeros, so that the generated sound is only missing some frequency components. If the error code 
corrupts the header or side information of an audio data frame so that the audio data frame can not be decoded, VLC/FLC 
decoder 120 generates an interrupt to CPU 150. CPU 150 can try to reconstruct the missing data using previous audio 
data frames or cause SPU 1 40 to decode again the previous audio data frame for the channel. 

Although the present invention has been described with reference to particular embodiments, the description is only 
an example of the invention's application and should not be taken as a limitation. Various adaptations and combinations 
of features of the embodiments disclosed will be apparent to those skilled in the art and are within the scope of the 



12 



EP 0 703 712 A2 

present invention as defined by the following claims. 



Appendix A 
Matrix Coefficients (Nji) 
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Appendix B 
Results of Dequant/Descale Instruction 
TO = (S0-S31) 
Tl = (S1-S30) 
T2 = (S2-S29) 
T3 » (S3-S28) 
T4 = (S4-S27) 
T5 - (S5-S26) 
T6 - (S6-S25) 
T7 « (S7-S24) 
T8 = (S8-S23) 
T9 = (S9-S22) 
T10 = (S10-S21) 
Til = (S11-S20) 
T12 « (S12-S19) 
T13 = (S13-S18) 
T14 « (S14-S17) 
T15 = (S15-S16) 
T16 = (S0+S31)-(S15+S16) 
T17 = (S1+S30)-(S14+S17) 
T18 = (S2+S29) -(S13+S18) 
T19 = (S3+S28) -(S12+S19) 
T20 = (S4+S27)-(S11+S20) 
T21 = (S5+S26)-(S10+S21) 
T22 = (S6+S25)-(S9+S22) 
T23 = (S7+S24)-(S8+S23) 

T24 = [ (S0+S31)+(S15+S16) ]-[ (S7+S24 ) + (S8+S23 ) ] 
T25 = [ (S1+S30)+(S14+S17) ]-[ (S6+S25)+(S9 +S22) ] 
T26 = [ (S2+S29)+(S13+S18) ]-[ (S5+S26) + (S10+S21) ] 
T27 = [ (S3+S28)+(S12+S19) ] - [ (S4+S27) + (S11+S20) ] 
T28 = [ (S0+S31)+(S15+S16) ) + [ (S7+S24)+(S8 +S23) ] 
T29 = [ (S1+S30)+(S14+S17) ]+[ (S6+S25)+(S9 +S22) ] 
T30 - [ (S2+S29)+(S13+S18) ] + [ (S5+S26) + (S10+S21) ] 
T31 = [ (S3+S28)+(S12+S19) ] + [ (S4+S27 ) + (S11+S20) ] 
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Appendix c 
Matrix Equations 
VO = P*T28 - P*T29 - P*T30 + P*T31 

VI = O*T0 - S*T1 - K*T2 + W*T3 + G*T4 - AA*T5 - C*T6 
+ AE*T7 - A*T8 - AC*T9 + E*T10 + Y*Til - I*T12 

- U*T13 + M*T14 + Q*T15 

V2 = N*T16 - V*T17 - F*T18 + AD*T19 - B*T20 - Z*T21 

+ J*T22 + R*T23 
V3 = M*TO - Y*T1 - A*T2 + AA*T3 - K*T4 - 0*T5 + W*T6 

+ C*T7 - AC*T8 + I*T9 + Q*T10 - U*T11 -E*T12 

+ AE*T13 - G*T14 - S*TX5 
V4 « L*T24 - AB*T25 + D*T26 + T*T27 

V5 = K*TO - AE*T1 + I*T2 + M*T3 - AC*T4 + C*T5 + 0*T6 

- AA*T7 + E*T8 + Q*T9 - Y*T10 + C*T11 + S*T12 

- W*T13 + A*T14 + U*T15 

V6 - J*T16 - AD*T17 + N*T18 + F*T19 -Z*T20 + R*T21 

+ B*T22 - V*T23 
V7 * I*TO - AA*T1 + S*T2 - A*T3 - Q*T4 + AC*T5 - K*T6 

- G*T7 + Y*T8 - U*T9 + C*T10 + 0*T11 - AE*T12 
+ M*T13 + E*T14 - W*T15 

V8 = H*T28 - X*T29 + X*T30 - H*T31 

V9 = G*TO - U*T1 + AC*T2 - 0*T3 + A*T4 + M*T5 - AA*T6 
+ W*T7 - I*T8 - E*T9 + S*T10 - AE*T11 + Q*T12 

- C*T13 - K*T14 + Y*T15 

V10 = F*T16 - R*T17 + AD*T18 - V*T19 + J*T20 + B*T21 

- N*T22 + Z*T23 

VII = E*TO - 0*T1 + Y*T2 - AC*T3 + S*T4 - I*T5 - A*T6 
+ K*T7 - U*T8 + AE*T9 - W*T10 + M*T11 - C*T12 

- G*T13 + Q*T14 - AA*T15 

V12 = D*T24 - L*T25 + T*T26 - AB*T27 

V13 = C*TO - I*T1 + 0*T2 - U*T3 + AA*T4 - AE*T5 + Y*T6 

- S*T7 + M*T8 - G*T9 + A*T10 '+ E*T11 - K*T12 + Q*T13 

- W*T14 + AC*T15 

V14 = B*T16 - F*T17 + J*T18 - N*T19 + R*T20 • V*T2i 

+ Z*T22 - AD*T23 
V15 = A*TO - C*T1 + E*T2 - G*T3 + I*T4 - K*T5 + M*T6 
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- 0*T7 + Q*T8 - S*T9 + U*T10 - W*T11 + Y*T12 

- AA*T13 + AC*T14 - AE*T15 
V16 = 0 

V17 = -VI 5 
V18 - -V14 
V19 = -V13 
V20 = -V12 
V21 « -Vll 
V22 - -V10 
V23 - -V9 
V24 = -V8 
V25 = -V7 
V26 = -V6 
V27 = -V8 
V28 = -V7 
V29 = -V3 
V30 = -V2 
V31 = -VI 
V32 = -VO 

V33 - -Q*TO + M*T1 + U*T2 - I*T3 - Y*T4 + E*T5 + AC*T6 

- A*T7 - AE*T8 - C*T9 + AA*T10 + G*T11 - W*T12 

- K*T13 ■+ S*T14 + 0*T15 

V34 - -R*T16 + J*T17 + Z*T18 - B*T19 - AD*T20 - F*T21 

+ V*T22 + N*T23 
V35 * -S*TO + G*T1 + AE*T2 + E*T3 - U*T4 - Q*T5 + I*T6 

+ AC*T7 + C*T8 - W*T9 - O*T10 + K*T11 + AA*T12 

+ A*T13 - Y*T14 - M*T15 
V36 = -T*T24 + D*T25 + AB*T26 + L*T27 

V37 = -U*TO + A*T1 + W*T2 + S*T3 - C*T4 - Y*T5 - Q*T6 
+ E*T7 + AA*T* + C*T9 - G*T10 - AC*T11 - M*T12 
+ I*T13 + AE*T14 + K*T15 

V38 = -V*T16 - B*T17 + R*T18 + Z*T19 + F*T20 - N*T21 

- AD*T22 - J*T23 

V39 = -W*TO - E*T1 + M*T2 + AE*T3 + 0*T4 - C*T5 - U*T6 

- Y*T7 - G*T8 + K*T9 + AC*T10 + Q*T11 - A*T12 

- S*T13 - AA*T14 - I*T15 
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V40 = -X*T28 - H*T29 + H*T30 + X*T31 

V41 = -Y*TO - K*T1 + C*T2 + Q*T3 + AE*T4 + S*T5 + E*T6 

- I*T7 - W*T8 - AA*T9 - M*T10 + A*T11 + 0*T12 
+ AC*T13 + U*T14 + G*T15 

V42 = -Z*T16 - N*T17 - B*T18 + J*T19 + V*T20 + AD*T21 

+ R*T22 + F*T23 
V43 = -AA*TO - Q*T1 - G*T2 + C*T3 + M*T4 + W*T5 + AE*T6 

+ U*T7 + K*T8 + A*T9 - I*T10 - S*T11 - AC*T12 

- Y*T13 - 0*T14 - E*T15 

V44 = -AB*T24 - T*T25 - L*T26 - D*T27 

V45 « -AC*TO - W*T1 - Q*T2 - KIT3 - E*T4 + A*T5 -f G*T6 
+ M*T7 + S*T8 + Y*T9 + AE*T10 + AA*T11 + U*T12 
+ 0*T13 + I*T14 + C*T15 

V4 6 = -AD*T16 - Z*T17 - V*T18 - R*T19 - N*T20 - J*T21 

- F*T22 - B*T23 

V47 *= -AE*TO - AC*T1 - AA*T2 - Y*T3 - W*T4 - U*T5 

- S*T6 - Q*T7 - 0*T8 - M*T9 - K*T10 - I*T11 - G*T12 

- E*T13 - C*T14 - A*T15 

V48 = -1*T28 - 1*T29 - 1*T30 - 1* T31 

V49 « V47 

V50 = V4 6 

V51 « V45 

V52 = V44 

V53 = V43 

V54 = V42 

V55 = V41 

V56 o V40 

V57 = V39 

V58 = V38 

V59 = V37 

V60 = V3 6 

V61 = V35 

V62 = V34 

V63 » V33 

V17 through 4 8 are stored in PMEM in reverse order. 
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Appendix D 

/* Audio Section of CL480 Signal Processing Unit: */ 
/* Written by Dave Galbi */ 

/include "global. h f< 
/include "spurom.h" 
/include "command . h H 

spuafcmd, pbank) { /* cmd«command being executed , 
pbank=pmem bank being used by SPU */ 

int i,k; /* loop counters for subband */ 

int zdata; /* subband samples read from zmem */ 

int bitalloc,scf ; /* bit allocation and scale factor 
indices */ 

int dequant; /* dequantized subband sample */ 

int descale; /* subband sample descaled by 

2*(-l/3) or 2~(-2/3) */ 

int sb[32]; /* fully descaled subband samples */ 

int u; /* unpacked input for windowing 

(called u[i] in MPEG spec) */ 

int paddr; /* address of pmem location with 

MSBs of u[i] */ 

int waddrl,waddr2; /* address of window coefficients */ 
int a,b,c,d; /* matrixing results */ 

static int zvec«0; /* indicates which of 6 vectors in 
zmem is being processe^ */ 

static int mindex=0; /* indicates which of 8 
window_MATRIX sections is being processed */ 

/* compute x*y where x is si. 20, y is si. 18 and the 
result is si. 20 */ 

/define mul(x,y) floor ( (double) x* (double) y/0x40000+. 5) 
/define clamp22(x) CLAMP (x, OxlFFFFF, -0x200000) 

/* clamp to si. 20 */ 
/define bf ly_and_clamp (sum, dif f , x,y) {diff = 
clamp22 (x-y) ; sum - clamp22 (x+y) } } 
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if (cmd===DEQUANT_DESCAIiE) { 
for(i=0; i<32; i++) { 

zdata = zmem[JOIN3 (zvec, 2,1, i,4,0, zvec, 0,0)] « 
3; /* zmem is s.15, zdata is s.18 */ 

/* LSB of zvec selects between left channel (0) 
and right channel (1) */ 

bitalloc = BIT(zvec,0) ? BITS(qmem[ i+32] ,15, 8) : 
BITS (qmem[ i+32] ,7,0) ; 

dequant = mul (coeffc [bitalloc] , (zdata + 
coeffd [bitalloc] ) ) ; 

scf = BIT(zvec,0) ? BITS(<jmem[i] ,15,8) : 
BITS(gmem[ i] ,7,0); 

scf = BIT(scf,7) ? 0 : BIT(scf,6) ? 63 : scf; /* 
clamp scf to [63,0] */ 

descale = mul (dequant, nint(pov(2., 
-(scf%3) /3 .) *0x40000) ) ; 

sb[i] « mul (descale, nint(pow(2., l.-scf/3 
)*0x40000) ) ; 

} 

zvec = (zvec+l)%6; 

for(i=0; i<16; i++) 
bf ly_and_clamp(sb[31-i] ,tmem[i] , sb[i] ,sb[31-i] ) ; 

for(i=16; i<24; i++) 
bf ly_and_clamp(sb[47-i] ,tmem[i] , sb[47-i] ,sb[i]) ; 

for(i=24; i<28; i++) 
bf ly_and_clamp(tmem[4+i] / tmem[i] , sb[55-i] ,sb[i]) ; 

} 

/* compute a + x*y where a and x are si. 20, y is si. 18 
and the result is si. 20 */ 
/define mac_and_clamp (a , x,y) 

clamp22 (a+(int) floor ( (double) x* (double) y/0x40000+. 5) ) 
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/define N(i,k) 

nint (cos( (16+i) * (2*k+l) *M_PI/64) *0x4O000) /* Si, 18 
matrix coefficients */ 

if (cmd=WINDOW_MATRIX) { 

/* Perform 1/8 of windowing for a vector: tmem = 
tmem + pmem*coef fw */ 

waddrl = 64* (7-mindex) + 32; 
waddr2 = 64* (7-mindex) + 63; 
for(i=0; i<33; i++) { 

if (mindex=0 & i<17) tmem[32+i)=0; /* 

reset tmem[32:48] when start new vector */ 

if (mindex==0 & i>0 & i<16) tmem[64-i]~0; /* 
reset tmem [ 49 : 63] when start new vector */ 
paddr = 2*(i + (i+3)/4) + pbank*128; 
switch (i%4) { /* unpack s.19 from pmem and 
multiply by 2 to get s,20 format */ 

case 0: u = 2*(JOIN3 (pmem [paddr] , 7,4, 
pmem [ paddr +2 ] , 7 , O , pmem [ paddr +3 ) , 7 , 0 ) ) ; break ; 

case 1: u = 2* ( JOIN3 (pmem[paddr ] , 7 , 0 , 
pmemtpaddr+1] ,7,0, pmem[paddr+3 ] ,3,0)) ; break; 

case 2: u = 2*(JOIN3 (pmem[paddr] ,7,0, 
pmem [ paddr +1] ,7,4 , pmem [paddr +3 ] , 7 , 0) ) ; break; 

case 3: u = 2*(JOIN3 (pmem[paddr) ,7,0, 
pmem (paddr +2] ,3,0, pmem [paddr +3] , 7, 0) ) ; break; 
} 

u — BIT(u,20) ? 0x200000 : 0; /* extend sign of 
data read from pmem * / 
if (i<32) { 

tmem[32+i] = 
mac_and_clamp(tmem(32+i] ,u,coef fw[waddrl++] ) ; 
if (i— 16) waddrl -= 32; 
} 

if (i>o & ii=i6) { 
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tmem[64-i] = 

mac_and_clamp(tmem[64-i] ,u* (i>15?-l : 1) , coef fw[waddr2 — ] 

); 

if (i==15) waddr2 — 33; 
> 

} 

if (mindex— 7) /* convert decoded samples to s.15 at 
end of last windowing step */ 
for(i=32; i<64; i++) { 

if (tmem[i]<0) tmem[i] += 31/ /* round toward 
zero because of conformance test */ 

tmem[i] = BITS( 
CLAMP ( tmem [ i ] , OxFFFFF, -0x100000) , 20 , 5) ; 
} 

/* Perform 1/8 of matrixing for a vector: pmem = 
SUM( tmem*Nik ) */ 

a =b=c=d=0 ; 

for(k=0; k<16; k++) { a = mac_and_clamp (a, tmem[k] , 
N ( 17+mindex*4 , k) ) ; 

c « mac_and_clamp(c, tmem[k) , 

N(19+mindex*4,k) ) ; } 

f or (k=0 ; k<8 ; k++) b = mac_and_clamp(b, 
tmem[k+16], N (18+mindex*4 ,k) ) ; 

for(k=0; k<4; k++) d = »ac_and_clamp(d, 
tmem[k+24+(mindex%2)*4] , N (20+mindex*4 ,k) ) ; 

mindex = (mindex+1) %8 ; 

/* Clamp matrix results to s.20 */ 
a « CLAMP (a, OXFFFFF, -0x100000); 
b - CLAMP (b, OxFFFFF, -0x100000); 
c = CLAMP(C, OXFFFFF, -0x100000); 
d = CLAMP (d , OxFFFFF, -0x100000); 

/* Pack 4 matrix results into pmem */ 
pmem[pbank*12 8] = BITS (a, 20, 13) ; 
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pmem [ pbank * 1 2 8 + l ] 
pmem [pbank* 12 8+2 ] 
pmem[ pbank* 12 8+3 ] 
pmem [ pbank * 1 2 8 +4 ) 
pmem [ pbank* 12 8+5 ] 
pmem [ pbank * 128+6] 
pmem [ pbank* 128+7] 
pmem [ pbank * 1 2 8+8 ] 
pmem [pbank* 12 8+9] 
} 



BITS (a, 12,5) ; 

BITS (b, 20, 13) ; 

JOIN2 (b,12,9, a, 4,1); 

BITS(c,20,13) ; 

BITS (b, 8,1) ; 

JOIN2(d, 20, 17, c,12,9); 

BITS (c, 8,1) ; 

BITS (d, 16,9) ; 

BITS (d, 8,1) ; 



/* CL480 Signal Processing Unit: */ 
/* Written by Dave Galbi */ 

t include "global . h" 
/include "command . h M 



spu() { 

int i,j; 
block */ 
int addr; 
int qac; 

int qaddr,qdata; 
int tmp; 
int dmac ; 
double cosine; 
int pass; 
IDCT from tmem to pmem */ 



/* row and column of coefficient in 

/* address of coefficient in block 
/* quantized AC coefficient */ 
/* qmem address and qmem data */ 
/* temporary variable */ 
/* result of dmac operation */ 
/* cosine in floating-point */ 
/* 1 for IDCT from zmem to tmem, 2 



int AO , BO , CO , DO , AAO , BBO , CCO , DDO , AAAO , BBBO , CCCO , DDDO ; 
intermediate IDCT results */ 

int A1,B1,C1,D1,AA1,BB1,CC1,DD1,AAA1,BBB1,CCC1,DDD1; 
intermediate IDCT results */ 
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int iA, iB , iC, iAA, iBB; /* intermediate IDCT results 

from imac operations */ 

static int cmd=0; /* command being processed by 

SPU (0 is idle) */ 

static int clock=0; /* number of clocks since, 

last SPU command was started */ 

static int coded=7; /* SPU_coded register, 

indicates which blocks are coded */ 

static int intrai, intraO; /* SPU_intra register, intra 
bit for new or previous macroblock */ 

static int quant 1, quant 0; /* SPU_quant register, quant 
for new or previous macroblock */ 

static int iaddr; /* address for indirect SPU 

registers */ 

static int taddr; /* tmem read/write address */ 

static int zaddr; /* zmem read address */ 

static int zblock=0; /* block of zmem being 

processed by IDCT */ 

static int pbank-0; /* pmem bank being used by 

SPU, each bank holds 2 8x8 blocks */ 

static int oldgsel=511; /* value of gsel on previous 
clock */ 

if (!BlT(gsel,6) ) /* read gbus register */ 
switch (BITS (gsel, 5,0) ) { 

case SPU_cmd: gbus_n = JOIN4 (cmd!=0, 0, 0, 
zero, 6,0, iaddr, 3,0, cmd,3,0); break; 

case SPU_coded: gbus_n = coded; break; 

case SPU_intra; gbus_n = J0IN2 (intrai, 0, 0 , 
intra0,0 r 0) ; break; 

case SPU_quant: gbus_ n = J0IN2 (quantl, 4 , 0, 
quant 0 , 4,0); break ; 

case SPU_tmem: gbus_n = 
BITS (tmem[32+taddr++] ,15,0) ; /* only non-test mode is 
*/ 
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taddr %= 32; break;/* 
supported, which reads addr 32 to 63 */ 

case SPU_zaddr: gbus_n = 2 addr; break; 
case SPU_zmem: gbus_n = zme»[zaddr++] ; 

zaddr %= 256; break; 

case SPU_idata: 

switch (iaddr) { 

case SPU_QMEM: gbus_n « 
qmem[JOIN2(intraO, 0,0, zaddr++,4,0) ] ; 

zaddr %«= 256 ; break; 
case SPUJPBANK: gbus_n - pbank; break; 
> 

break ; 

} 

if (BIT(oldgsel, 6) ) /* write gbus register */ 
switch (BITS (oldgsel, 5,0) ) { 

case SPU_cmd: cmd = BITS (gbus, 3, 0) ; 

iaddr = BITS (gbus, 7,4) «4; 
pbank I BIT (cmd, 0) & cmd>0; /* 

invert pbank on WINDOW_MATRIX and */ 

clock = 0; break; /* 
every other IDCT instruction */ 

case SPU_coded: coded - J0IN2 (coded, 2 , 0, 
gbus, 0,0); break; 

case SPU_intra: intraO = BIT(gbus,0); break; 
case SPU_quant: quantO = BITS (gbus, 4 , 0) ; break; 
case SPU_zaddr: zblock = BITS (gbus, 7 , 6) ; /* 
zblock is the address for stage 2 */ 

zaddr = BITS (gbus, 7 , 0) ; break; 
/* in both hardware and C -mod el */ 

case SPU_tmem: tmem[ 3 2+taddr++] = gbus; /* only 
non-test mode is supported, which */ 

taddr %= 32; break; /* 
writes addresses 32 to 63 */ 
case SPU idata: 
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switch (iaddr) { 

case SPU_QMEM: qmem[JOIN2 (intra 0, 0, 0, 
zaddr++,4,0) }= gbus; 

zaddr %= 256; break; 
case SPU_TADDR: taddr = BITS (gbus, 5 , 1) ;- 
if ( ! BIT (gbus, 0) ) 
printf ("WARNING: 
SPU_TADDR[0] should be written with one.\n"); 

break; 

case SPU_PBANK: pbank - BIT(gbus,0); break 
} 

break; 

} 

oldgsel « gsel; 

if ( cmd«*DEQUANT_DE SCALE && clock++«»152) 
{spua(cmd, pbank) ; cmd=0; } 

if (cmd=WINDOW_MATRIX && clock++— 127) 
{spua(cmd, pbank) ; cmd=0;} 

if (cmd>=l 6 cmd<=6 && clock++=214) {/* Perform 

IDCT command */ 

if (cmd=3) { 

intral = intraO; 
quantl = quantO ; } 

for(pass=2; pass>0; pass — ) { 
for(i=0; i<8; i++) { 

if (pass=l) for(j=0; j<8; j++) {/* Perform 
dequantization and cosine multiply */ 

addr = BIT(cmd,0) ? i*8 + j : j*8 + i; /* 
do pass 1 on rows when cmd is odd */ 

qac = BIT(coded,3) * zmem[addr + 

zblock*64]; 

if ( ! BIT ( intra 1,0) & qac<0) qac — ; 
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/* Upper half of qmem is quantizer for 
intra, lower half is for non-intra */ 

qaddr = JOIN2 ( intral, 0 , 0 , addr,5,0); 
qdata « BIT (qaddr, 0) ? 
BITS (qmem [ qaddr »1] ,15,8) : BITS (qmem [ qaddr »1] ,7, 0) ; 

/* Do not use quantizer scale factor for 
the DC term of an intra MB */ 

tmp = (qaddr=64) ? 8*qdataz 
quantl*qdata ; 

/* Result of dmac instruction should be 2x 
value of dmac computed below */ 

dmac = (tmp * (2*qac + BIT (-antral 6 
qac!=0,0)) + (qac<0)*0xf) » 4; 

cosine- cos( (i?i:4) *MJPI/16) * 
cos( (j 1 ?j:4)*M_PI/16) ; 

/* Round cosine to 18 fraction bits */ 
cosine = floor (cosine* (1«18) + .5) / 

(1«18); 

/* Decrement dmac if it is positive and is 
not the intra DC */ 

if (dmaoo & qaddr ! =64) dmac — ; 
/* Replace LSB of dmac */ 

tmp = (dmac6-l) + BIT (qaddr I =64 fc 

dmac 1=0 , 0) ; 

/* Limit range of tmp to [2047,-2048] and 
shift left */ 

tmp = CLAMP(tmp, 2047,-2048) « 8; 

/* Increment DC term at bit 7 so truncated 
output of IDCT is rounded */ 

tmem[addr] * floor (cosine * tmp + .5) + 
( (addr— 0) ?1«9 : 0) ; 

} 

/define X( index) tmem[BIT (cmd, 0) ? i*8 + index : 
index* 8 + i] 
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#define bf ly (sum,dif f , x,y) sum=x+y; diff=x-y; 
/define imac(a,b,c,d) cosine = 

floor (cos (d*M_PI/8) *(1«19) + .5) / (1«18) ; \ 

a « floor (c*cosine + ,5) - b; 
bfly<A0,Al, X(1),X(7)); 
bfly(B0,Bl, X(3),X(5>); 
bfly(C0,Cl, X(2),X(6)); 
bfly(D0,Dl, X(0),X(4)); 
imac(iA, A0,A1,1) ; 
imac(iB, B0,B1,3); 
imac(iC, CO, CI, 2) ; 
bf ly ( AAO , AAl , A0,B0); 
bf ly(BB0,BBl, iA,iB) ; 
bf ly(cco,cci, Dl,iC); 
bf ly(DD0,DDl, D0,C0) ; 
' imac(iAA, BB0,AA1,2); 
imac(iBB, AA0,BB1,2); 
bf ly(X(0) ,X(7) , DD0,AA0) ; 
bf ly(X(l) ,X(6) , CC0,BB0); 
bfly(X(2) ,X(5) , CCl,iAA); 
bfly(X(3) ,X(4) , DDl,iBB); 
} 

if (pass— 2) 

for(i=0; i<64; i++) { /* copy tinea to pmem */ 
/* chroma blocks (cmd-2 or 3) are 
interleaved in pmem, luma blocks are not */ 

/* for luma, cmd[0] determines which of the 
2 blocks in a bank is written */ 

addr = (cmd!=2 & cmd!=3) ? ((i&7) | ((i«l) 
& Oxfo) J ((i»4) & 8)) + 8*(cmd&l) : 

<(cmd==2) ? 2*i : 2*i + 1) ; 
/* intra blocks are clamped to [255,0], 
non-intra are clamped to [255,-256] */ 

pmem[addr + pbank*128] = 
CLAMP(tmem[i]»10,255,intral?0:-256) ; 

} 
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} 

if (BIT(coded,3) ) 

zblock = (zblock+1) % 3; 
coded = JOIN2( coded, 2,0, coded, 0 f 0) ; 
cmd=0; 
} 



Claims 

1 . An integrated audio/video decoder comprising : 

a first internal memory; 

means for alternately writing a block of audio data to the first memory then writing a block of video data to 
the first memory; and 

a signal processing unit which alternately decodes audio data from the first memory then decodes video data 
from the first memory. 

2. The decoder of claim 1 , wherein the signal processing unit further comprises: 

a multiplier which multiplies IDCT coefficients and d equalization coefficients during video decoding and 
multiplies components of quantized sample vectors and dequantization coefficients during audio decoding; and 

a butterfly unit which determines sums and differences of IDCT coefficients for an inverse discrete cosine 
transformation during video decoding and determines sums and differences of components of a frequency-domain 
sample vector during audio decoding. 

3. A method for decoding an MPEG audio data frame, comprising the steps of: 

decoding subband data from an audio data frame to generate a first vector, the first vector having components 
which represent frequency-domain components of a sound sample; 

combining two or more components of the first vector, using a butterfly unit; 
determining a product of the combination and a matrixing coefficient, using a multiplier; 
accumulating the product into a memory location; and 

repeating the combining, determining, and accumulating steps one or more times to determine a component 
of a second vector, the second vector having components which represent frequency-domain components of a 
sound sample. 

4. The method of claim 3, further comprising: 

repeating the combining, determining, accumulating, and repeating steps of claim 3 a plurality of times to 
generate a linearly independent set of components of the second vector; 
storing components of the second vector in a second memory; 

retrieving from the second memory components of other vectors, wherein the other vectors each have com- 
ponents which represent frequency-domain components of a sound sample; and 

combining the components of the second vector with the components of the other vectors to generate sound 
amplitude values. 

5. The method of claim 4, wherein the step of storing components of the second vector comprises storing only com- 
ponents which are linearly independent of each other. 

6. The method of claims 4 or 5, wherein the step of retrieving components of the other vectors comprises retrieving 
only components which are linearly independent of each other. 

7. A method for generating sound amplitude values from data following the MPEG encoding standard, comprising the 
steps of: 
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transferring a block containing components from a time-domain vector to a first memory from a second mem- 
ory; 

determining products of each of the components in the block by a corresponding windowing coefficient; and 
accumulating the products in a plurality of sums, each sum corresponding to a different sound amplitude 

value. 

8. The method of claim 7, wherein the step of transferring the blockfurther comprises transferring 1 7 components from 
a first time-domain vector and 16 components from a second time-domain vector. 

9. The method of claims 7 or 8, wherein: 

the step of determining products comprises performing 64 multiplications, each multiplication involving one 
of the components from the block and a windowing coefficient; and 

the step of accumulating comprises adding a pair of the products to each of 32 sums. 

10. The method of claim 9, further comprising: 

multiplying each of a series of matrixing coefficient by a corresponding combination of components of a 
frequency-domain vector; and 

accumulating the products to generate four components of a time-domain vector; 
writing the four component of the time-domain vector to the second memory. 

1 1 . The method of claim 1 0, further comprising repeating the steps of claims 7, 8, 9, and 1 0 eight times wherein no two 
steps of transferring a block transfers components from the same pair of time-domain vectors. 

12. A degrouping circuit comprising: 

a first multiplexer; 
a second multiplexer; 

a divider having an input bus operably connected to receive from the first and second multiplexer a signal 
representing a dividend; 

a first register having an input bus coupled to the divider to receive a signal representing a remainder and 
an output bus coupled to a first input bus of the first multiplexer; and 

a second register having an input bus coupled to the divider to receive a signal representing a quotient and 
an output bus coupled to an input bus of the first multiplexer. 

1 3. The degrouping circuit of claim 1 2. further comprising: 

a third register having an input bus coupled to the output bus of the second register and an output bus coupled 
to an second input bus of the first multiplexer; and 

a third multiplexer having a lirst input bus coupled to the output bus of the first register and a second input 
bus coupled to the output bus of the third register. 

14. The degrouping circuit of claims 12 or 13, wherein the divider further comprises select terminals for selecting a 
divisor. 

15. The degrouping circuit of claim 14, wherein the divider further comprises: 

a read-only memory having a data bus coupled to the input buses of the first and second registers; and 
an address generator coupled between the input bus of the divider and an address bus of the read-only 
memory. 

16. The degrouping circuit of claim 15. wherein: 

the address generator asserts an address signal to the read-only memory; 

when a signal on the select terminals has a first value, the address signal equals the signal representing the 
dividend; and 

when the signal on the select terminals has a second value, the address signal equals a logical combination 
of the signal representing the dividend and the signal on the select terminals. 

17. The degrouping circuit of claim 15, wherein: 

the first multiplexer asserts a first 4-bh signal to the address generator; 

the second multiplexer asserts a second 4-bit signal to the address generator; 

the address generator asserts an 8-bit address signal to the read only memory; 

when a signal on the select terminals has a first value, the four most significant bits of address signal equals 



32 



EP 0 703 712 A2 



the first 4-bit signal, and the four least significant bits of address signal equals the second 4-bit signal; 

when the signal on the select terminals has a second value, the most significant bit of address signal equals 
one, and the seven least significant bits of address signal equal a combination of the three least significant bits of 
each of the first and the second 4-bit signals; and 

when the signal on the select terminals has a third value, the most significant bit of address signal equals 
one, and the seven least significant bits of address signal are constant or equal to a combination of the two least 
significant bits of each of the first and the second 4-bit signals. 

18. The degrouping circuit of claim 14, wherein the divider further comprises: 

a divide-by-three circuit; . . 

a divide-by-five circuit; 
a divide-by-nine circuit; 

a multiplexer having select leads coupled to the select terminals, an input bus coupled to the input bus of the 
divider, a first output bus coupled to the divide-by-three circuit, a second output bus coupled to the divide-by-five 
circuit, and a third output bus coupled to the divide-by-nine circuit. 

19. A method for decoding a digital data stream containing an error, the method comprising the steps of: 

transmitting a digital data stream from a data source to a decoder; 

asserting an error signal from the data source to the decoder when the data source detects an error; 

replacing a portion of data in a digital data stream with an error code when the error signal and the portion 
of data are received by the decoder; 

asserting a flag signal in the decoder to enable replacing of bit combinations which are in the data stream 
and equal to the error code; 

changing the data stream by replacing a bit combination which is in the data stream and equal to the error 
code; and 

decoding the changed data stream. 

20. The method of claim 19, wherein: 

the error code is a valid bit combination in an errorless data stream; and 

the method further comprises leaving the flag set for a time and then deasserting the flag to disable replacing 
bit combinations which are in the data stream and equal to the error code. 

21. The method of claim 20, wherein bit combinations equal to the error code are sufficiently rare in an errorless data 
stream that replacing bit combinations which are in an errorless data stream and equal to the error code does not 
noticeably change decoded data. 

22. The method of claim 20, wherein the digital data steam is an audio data stream which follows the MPEG encoding 
standard. 

23. The method of claim 22, wherein the step of changing the data stream further comprises replacing subband data 
with zeros. 

24. The method of claim 22, wherein the step of changing the data stream further comprises: 

replacing subband data with zeros when a bit combination equal to the error code is in the subband data; and 
replacing a first audio data frame with a previous audio data frame when a bit combinations equal to the error 
code is at least partly in the header or side information of the first data frame. 

25. The method of claim 22, wherein the step of changing the data stream further comprises replacing the bit combination 
with similar data derived from one or more previous audio data frames. 

26. The method of one of claims 20 to 25, further comprising: 

writing a 1 into a value in a shift register when the error signal is asserted; and 
shifting the value in the shift register periodically, wherein 

asserting the flag signal further comprises asserting the flag signal when the value in the shift register is not 

zero. 

27. The method of claim 26, further comprising writing the changed data stream into a buffer, wherein shifting the value 
further comprises shifting the value at a rate such that a non-zero bit remains in the shift for a time greter than or 
equal to a time that an error code remains in the buffer. 
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